ParaText: Scalable Text Modeling and Analysis pdfkeywords

نویسندگان

  • Daniel M. Dunlavy
  • Timothy M. Shead
  • Eric T. Stanton
چکیده

Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the flexibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaffolding Comprehension and Recall Gaps: Effects of Paratextual Advance Organizers

Although filling the gap in reading comprehension gained momentum with the rise of the top-down approach, Vygotsky’ concept of scaffolding and the dual code  theory provided a strong support for the use of paratext to enhance comprehension. Scaffolding is dependent on other-regulation, one type of which is object-regulation. From this vantage-point, various types of paratext can function as sou...

متن کامل

Design Framework of a Database for Structured Documents with Object Links

Structured documents often contain character strings of which semantics can be naturally stored as database values or has direct correspondence with database values. By building bilateral logical links between character strings in documents and corresponding database values, semantically rich queries are made expressible. We have introduced a new ADT, named “paratext,” to model text which has l...

متن کامل

Shakespeare, Text and Paratext

The early modern dramatic paratext is a rich and varied repository of tributes to patrons and readers, where dramatists negotiated or parodied their attitudes towards dramatic publication and their reliance on the medium of print as a source of income and literary reputation. However, the lack of signed dedications or addresses to the reader in the early editions of Shakespeare’s plays has defl...

متن کامل

Inner circles and outer reaches: local and global information-seeking habits of authors in acknowledgment paratext

Introduction. This research investigates paratextual acknowledgements in published codices in order to study how relationships inform the information-seeking habits of authors, an understudied group in library and information science. Method. A purposive sample consisting of the books from the 2010 nominations list of the Canadian Governor General's Literary Awards was chosen. An in-hand examin...

متن کامل

A análise documentária no grupo Temma: dos indícios às evidências da formação de unidades discursivas*

The article poses the formation of discursive units in the documentary analysis in relation to the Group Temma from ECA (School of Communication and Arts) of USP (University of São Paulo) based on the theoristmethodological confluence between the bibliometry and the Archaeology of knowledge by Michel Foucault. It considers as the object of analysis the collection “Documentary Analysis: an analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010